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1.  Introduction 

We  present  a  floating-point  precision  modeling 
methodology  that  can  be  used  to  develop  application 
adaptive  arithmetic  precision  models  for  variable  bit- 
width  floating-point  computing.  We  also  developed 
optimization  algorithms  that  minimize  the  total  bit-width 
for  the  application  such  that  the  output  accuracy  meets 
user-defined  requirements.  The  methodology  supports 
different  bit-widths  for  different  variables  in  the  datapath. 

Computing  using  floating-point  (FP)  representations 
provides  a  wide  dynamic  range  of  real  numbers,  freeing 
programmers  from  writing  the  manual  scaling  code 
required  for  fixed-point  representation.  Nevertheless, 
floating-point  operations  have  always  been  considered 
beyond  the  capabilities  of  custom  or  re-configurable 
hardware  implementation.  IEEE  standard  precision 
floating-point  operations  cost  too  much  in  power  and  area 
to  be  practical  on  many  devices. 

A  promising  solution  to  reduce  the  cost  of  FP 
implementation  is  to  reduce  the  bit-width  of  the  FP 
representation.  Research  results  show  that  it  is  feasible 
and  beneficial  to  use  reduced  bit-width  FP  representation 
in  modem  multimedia  and  streaming  application 
workloads  [1].  By  taking  advantage  of  bit- width 
information  during  architectural  synthesis,  area  is  reduced 
by  15-86%,  clock  speed  improved  by  3-249%,  and  power 
consumption  reduced  by  46-73%  [2], 

The  optimal  bit-widths  are  the  smallest  bit-widths 
that  satisfy  the  accuracy  requirement.  They  can  be 
obtained  through  simulation-based  searching  or  model- 
based  optimization.  Simulation-based  bit-width  searching 
is  a  process  that  simulates  using  all  possible  bit-widths, 
and  finds  the  best  solution.  It  is  a  straight-forward  method 
to  determine  the  minimal  bit-width,  but  it  does  not 
provide  any  intelligent  optimization,  and  it  can  consume 
enormous  computation  time,  especially  when  the  target 
applications  are  large  designs  or  a  large  input  space  is 
involved.  As  a  better  approach,  model-based  bit-width 
optimization  eliminates  the  need  for  exhaustive 
simulation,  and  automatically  analyzes  and  adapts  the 
level  of  precision  according  to  the  need  of  an  application. 


the  form  of  a  function  between  the  relative  error  in  the 
output  and  the  mantissa  bit-widths  (one  for  each  FP 
variable)  used  in  the  FP  datapath.  The  model  constructed 
using  this  methodology  can  estimate  the  output  error 
range  given  the  custom  FP  bit-widths  used.  The 
optimization  algorithm  developed  to  optimize  the  bit- 
width  is  a  combination  of  the  popular  Steepest  Descent 
method  and  the  unique  characteristics  of  the  bit-width 
optimization  problem. 

2.  A  Methodology  for  FP  Precision  Modeling 

An  arithmetic  precision  model  can  be  built,  based 
on  the  application’s  function  and  data,  to  represent  the 
relationship  between  output  precision  and  bit-widths  used 
in  the  FP  application.  Our  experimental  results  prove  that 
the  precision  model  constructed  via  this  method  gives 
reasonable  estimates  of  the  output  accuracy.  We  have 
successfully  used  the  precision  model  in  a  bit-width 
optimization  program  and  obtained  optimized  reduced 
bit-width.  The  methodology  for  developing  such  a  model 
is  presented  in  this  section. 

The  FP  application  being  analyzed  is  represented  in  a 
graphical  intermediate  format:  the  Control  and  Data  Flow 
Graph  (CDFG).  The  CDFG  is  commonly  used  in  high- 
level  synthesis  and  can  effectively  represent  the 
functional  and  the  structural  description  of  an  application. 
A  CDFG  representation  of  a  differential  equation  solver  is 
shown  in  Figure  1. 


Figure  1.  CDFG  of  a  differential  equation  solver 


The  FP  precision  modeling  methodology  presented  in 
this  paper  is  an  application-adaptive  arithmetic  model  in 


The  first  process  in  the  precision  modeling  is  called 
behavioral  profiling.  Behavioral  profiling  is  analogous  to 
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software  profiling.  Given  a  behavioral  specification 
(CDFG)  of  an  application  and  a  set  of  input  vectors, 
behavioral  profiling  involves  gathering  pertinent  profile 
data  such  as  number  of  times  an  operation  node  is  visited, 
number  of  times  a  conditional  branch  is  taken,  and 
number  of  times  a  loop  or  subprogram  is  executed.  The 
behavioral  profiling  process  involves  a  one-time 
simulation  prior  to  constructing  the  model. 

The  behavioral  profiler  does  the  following.  For  each 
CDFG  node  n,  determine  the  number  of  times  the  node  is 
executed  for  the  given  profiling  stimuli,  and  record  each 
bit  of  the  result  of  the  operation.  Bit  probabilities 
(probability  of  the  bit  being  “1”)  of  the  result,  Pn(i)  can 
be  calculated  based  on  this  information. 

The  profile  data  is  used  to  construct  a  precision  model 
that  best  reflects  the  functional  relationship  between  the 
bit-width  of  floating-point  operations  and  the  output 
precision.  The  model  constructed  using  the  methodology 
is  an  arithmetic  model  that  describes  the  function  between 
the  output  precision  and  the  bit-widths  used  in  the  FP 
application.  The  overall  error  at  the  output  of  an  operation 
is  composed  of  propagation  error,  which  is  determined  by 
the  errors  of  input  data  and  the  operation  type  only,  and 
rounding  error,  which  is  caused  by  the  rounding  of  the 
operation  result. 

There  are  many  ways  to  estimate  the  bounds  of  the 
rounding  error.  With  the  profile  data  available,  the  most 
accurate  and  convenient  method  for  this  research  is 
presented  in  formula  (1). 


3.  Experiment  Results 


E  r  ro  r- b  itw  id  th  curve4  of  exmaple  DIFFEQ  (u  output) 


Figure  2.  Comparison  of  estimated  error  and  actual  error 

Figure  2  shows  the  comparison  of  the  error 
estimated  using  the  precision  model  (dashed  line)  and  the 
actual  error  (dots)  for  the  DIFFEQ  example  (shown  in 
Figure  1).  The  result  proves  that  the  precision  models 
developed  using  the  methodology  can  effectively  estimate 
the  error  range. 

The  bit-width  optimization  problem  is  solved  by  a 
Grid  Steepest  Descent  (GSD)  method  derived  for  this 
specific  precision  modeling  methodology  and  this  bit- 
width  optimization  problem.  Optimization  results  of  2 
examples  are  shown  in  Table  1  -  the  DIFFEQ  example 
and  a  simple  three-multiplication-operation  example. 


e  _z-zjk,2^ 4- ■  +a1^r22+a1^2) +■  ■  +p1?11+p1^  (i) 

z-r  z  (LOf^r1  +•  •  -k^r23)  ~ i.o +pt'  +Pit2  +•  •  +PbT2i 

The  propagation  error  is  derived  based-on  the  Mean- 
value  Theorem.  The  result  is  presented  in  formula  (2). 


Number  of  Total  Bit- Width  in  Datapath 
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34 

\f(X,y)-f(mx.fUM£xjm£  =kA+k£  (2) 

My)  My)  My) 

The  ks  depend  on  the  operation  type,  and  also 
the  profile  data.  For  operation  MULTIPLY  (z  =x  *y), 


f(x,y)x xy 
xy 


k=J  v^Z  =  ^.  =  1.0,fc  =■'  =^-  =  i.o  (3) 


fix,  y)y *y 

xy 


fix,  y)  xy  fix,}’) 

For  operation  DIVIDE  (z  =  x  / y). 


k*_f'ix,y)x_xy_l0.k  f'(x,y)y  -xy  1Q  (4) 
fix,}’)  xy  y  f  (x,  y)  xy 


The  arithmetic  model  for  any  FP  operation  is  the 
sum  of  these  two  errors.  The  precision  model  for  the 
entire  application  whose  structural  information  is 
represented  in  the  CDFG  can  be  easily  derived. 


Table  1 .  Optimization  results  for  different  precision  targets 

The  results  demonstrate  that  the  GSD  optimization 
method  can  be  successfully  used  with  the  precision 
models  to  calculate  the  minimal  bit-widths  that  satisfy  the 
user-defined  precision  requirement  of  the  application.  The 
minimal  bit-widths  can  be  the  same  bit-width  for  all 
operations,  or  one  for  each  individual  operation. 
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Why  Variable  Bitwidth  Computing  ? 

Obtaining  Optimized  Bitwidths 

-  Other’s  approach  :  simulation-based  bitwidth 
searching 

-  Our  approach:  model-based  bitwidth 
optimization 

-  System  flowchart  of  the  model-based  bitwidth 
optimization 


Standard  FP  representation  (Single  Precision):  32  bits 

-  sign:  1  bit 

-  exponent:  8  bits 

-  mantissa:  23  bits 

Implementation  of  standard  FP  operations  in  custom 
circuits  is  expensive 

Mantissa  bitwidth  can  be  reduced  without 
compromising  precision  requirements 
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Experiment  results  of  three  examples 
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Behavioral  Profiling 


Error  Models  of  FP  Operations 

-  Rounding  Error 

-  Propagation  Error 


•  Constructing  Precision  Model 


•  Behavioral  profiling  gathers  profile  data  through 
one-time  simulation 

-  bit  probability 

-  statistical  values  of  variables  in  data-path 

•  Profiling  is  performed  on  a  graphical  representation 
(usually  CDFG)  of  the  application 

•  Profile  data  is  used  in  precision  modeling 

•  Selecting  stimuli  is  important 


•  Watcher  insertion  &  simulation 


Watcher 


CDFG  node 


Data 


Stimuli 


•  The  overall  error  of  a  FP  operation: 


U'0y)  -  fp(xQy)  =  ( xQy  -  x0y)  +  (xQy  -  fp(xQy)) 


•  Overall  error  at  the  result  of  an  operation  : 

Propagation  Error(PE)  +  Rounding  Error(RE) 
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Calculation  of  PE 
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Equation: 
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Mantissa: 


Precise 

value: 

FP  value: 
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Error: 


Establish  error  model  for  each  node  in  data-path 
using  the  RE+PE  model 

Construct  precision  model  for  the  application 
based-on  data-path  structure 

The  precision  model  is  a  function  of  output  error 
of  the  application  in  terms  of  bit-widths  in  data¬ 
path  and  input  error  of  the  application 

The  precision  model  can  be  used  to  predict 
output  error  and  optimize  bitwidths 
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•  Experimental  Procedure 

•  Example  Data-paths  (CDFGs) 

•  Comparison  of  Predicted  Error  Range  and  Actual 
Errors 
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Relative  error  ( 


Error-bitwidth  curve  of  Exmaple  PID 


Problem  Formulation 


Optimization  Methods 

-  Grid  Steepest  Descent  (GDS) 

-  Accelerated  Grid  Steepest  Descent  (GSD-A) 


Optimization  Results 


Precision  model  based  on  CDFG  and  profile  data: 


Objective 

function: 

Constraints: 


N :  number  of 
nodes  in  CDFG 


mm 


in2>, 


bi:  bit-width  of 
node  i 


P:  accuracy 
requirements 


Z=[  1,2,. ..,23] 


Z:  range  for  bit- 
width  selection 
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x:  vector  of  bitwidths 
lc:  search  step 


Regular  Steepest  Decent 

-  In  each  step,  direction  is  calculated,  step  length 
is  determined  by  searching  in  the  direction 

Grid  Steepest  Decent  (GSD) 

-  In  each  step,  step  length  is  fixed  (A=1),  direction 
is  determined  by  searching 
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Grid  Steepest  Decent  (GSD) 

f - 1 

Initialize  bitwidths 


Search  neighbors 


Adopt  best  direction 


Lower  bound 

Only  one  of  the 
bitwidths  increases 
by  1 

X  =  1 

d  =  [0  0  ...  0  1  0  ...  0] 


•  Accelerated  GSD  (GSD-A) 

-  “Smart”  Initial  Point 

-  Binary  search  to 
locate  initial  point 

-  Total  search  time  is 
reduced  to  a  fraction 

-  Initial  Point:  All  bitwidths 
have  the  same  initial  value 

-  GSD:  Each  bitwidth  is 
calculated  individually 
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Result  Comparison  (P  =  5%) 
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•  IEEE 
Format 

•  Total 
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•  Output 
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•  Validate  the  optimized  bitwidths 

-  C++  floating-point  library  supporting  variable  bitwidth 

-  VHDL  Variable  precision  floating-point  component  library, 
developed  by  Rapid  Prototyping  Lab  at  Northeastern 
University,  available  under  GPL  at 

http://www.ece.neu.edu/groups/rpl/projects/floatingpoint/ 

•  Improve  error  models 

-  Propagation  error  models  and  rounding  error  models 

-  Singularity  issues 

•  Integrate  in  high  level  synthesis  flow 

-  IEEE  1076.3  working  group:  variable  bitwidth  floating¬ 
point  for  synthesis 


Variable  bitwidth  FP  computing  is  viable 


Model-based  bitwidth  optimization  has 
advantages  over  simulation-based  searching 

A  methodology  of  FP  precision  modeling  has 
been  developed 

The  precision  model  predicts  output  error  and 
can  be  used  for  bit-width  optimization 
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•  A  customized  optimization  algorithm,  Grid 
Steepest  Descent  (GSD),  has  been  developed 

•  Search  acceleration  techniques  have  been  applied 
to  GSD 

•  Optimized  bitwidths  for  a  given  precision  target 
can  be  found  quickly 


Sum  of  the  optimized  bitwidths  is  significantly 
smaller  than  that  of  standard  IEEE  format 


